Scene Text Detection via Holistic, Multi-Channel Prediction

نویسندگان

  • Cong Yao
  • Xiang Bai
  • Nong Sang
  • Xinyu Zhou
  • Shuchang Zhou
  • Zhimin Cao
چکیده

Recently, scene text detection has become an active research topic in computer vision and document analysis, because of its great importance and significant challenge. However, vast majority of the existing methods detect text within local regions, typically through extracting character, word or line level candidates followed by candidate aggregation and false positive elimination, which potentially exclude the effect of wide-scope and long-range contextual cues in the scene. To take full advantage of the rich information available in the whole natural image, we propose to localize text in a holistic manner, by casting scene text detection as a semantic segmentation problem. The proposed algorithm directly runs on full images and produces global, pixel-wise prediction maps, in which detections are subsequently formed. To better make use of the properties of text, three types of information regarding text region, individual characters and their relationship are estimated, with a single Fully Convolutional Network (FCN) model. With such predictions of text properties, the proposed algorithm can simultaneously handle horizontal, multi-oriented and curved text in real-world natural images. The experiments on standard benchmarks, including ICDAR 2013, ICDAR 2015 and MSRA-TD500, demonstrate that the proposed algorithm substantially outperforms previous state-ofthe-art approaches. Moreover, we report the first baseline result on the recently-released, large-scale dataset COCO-Text. Keywords—Scene text detection, fully convolutional network, holistic prediction, natural images.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fused Text Segmentation Networks for Multi-oriented Scene Text Detection

In this paper, we introduce a novel end-end framework for multi-oriented scene text detection from an instanceaware segmentation perspective. We present Fused Text Segmentation Networks, which combine multi-level features during feature extracting as text instance may rely on finer feature expression compared to general objects. It detects and segments the text instance jointly and simultaneous...

متن کامل

Multi-Oriented Scene Text Detection via Corner Localization and Region Segmentation

Previous deep learning based state-of-the-art scene text detection methods can be roughly classified into two categories. The first category treats scene text as a type of general objects and follows general object detection paradigm to localize scene text by regressing the text box locations, but troubled by the arbitrary-orientation and large aspect ratios of scene text. The second one segmen...

متن کامل

PixelLink: Detecting Scene Text via Instance Segmentation

Most state-of-the-art scene text detection algorithms are deep learning based methods that depend on bounding box regression and perform at least two kinds of predictions: text/nontext classification and location regression. Regression plays a key role in the acquisition of bounding boxes in these methods, but it is not indispensable because text/non-text prediction can also be considered as a ...

متن کامل

Text Scanner with Text Detection Technology on Image Sequences

We propose a text scanner, which detects wide text strings in a sequence of scene images. For scene text detection, we use a multiple-CAMShift algorithm on a text probability image produced by a multi-layer perceptron. To provide enhanced resolution of the extracted text images, we perform the text detection process after generating a mosaic image in a fast and robust image registration

متن کامل

Scene Text Detection in Video by Learning Locally and Globally

There are a variety of grand challenges for text extraction in scene videos by robots and users, e.g., heterogeneous background, varied text, nonuniform illumination, arbitrary motion and poor contrast. Most previous video text detection methods are investigated with local information, i.e., within individual frames, with limited performance. In this paper, we propose a unified tracking based t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1606.09002  شماره 

صفحات  -

تاریخ انتشار 2016